A Strategy for Efficient Crawling of Rich Internet Applications
نویسندگان
چکیده
This thesis studies the problem of crawling rich internet applications. These applications are built using advanced web technologies which allow them to be more dynamic and enable better user experiences. In recent years, the popularity and importance of web applications has continually increased and they are now very commonly used to complete essential tasks such as financial transactions. As a result, the need to crawl these applications goes beyond the desire to index content for search. For example, applications also need to be analyzed in order to detect security vulnerabilities and assess accessibility. In this thesis, the challenges involved with crawling rich internet applications are discussed and an efficient strategy for crawling these applications is presented. We also use this strategy to develop a prototype tool for crawling AJAX-based applications.
منابع مشابه
A Statistical Approach for Efficient Crawling of Rich Internet Applications
Modern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling RIAs. This new strategy is designed based on the concept of “Model-Based Crawling” introduced in [3] and uses statistics accumulated during the cr...
متن کاملA Statistical Approach for Efficient Crawling of Rich Internet Applications1
Modern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling RIAs. This new strategy is designed based on the concept of “Model-Based Crawling” introduced in [3] and uses statistics accumulated during the cr...
متن کاملBuilding Rich Internet Applications Models: Example of a Better Strategy
Crawling “classical” web applications is a problem that has been addressed more than a decode ago. Efficient crawling of web applications that use advanced technologies such as AJAX (called Rich Internet Applications, RIAs) is still an open problem. Crawling is important not only for indexing content, but also for building models of the applications, which is necessary for automated testing, au...
متن کاملGDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications
Crawling web applications is important for indexing, accessibility and security assessment. Crawling traditional web applications is an old problem, for which good and efficient solution are known. Crawling Rich Internet Applications (RIA) quickly and efficiently, however, is an open problem. Technologies such as AJAX and partial Document Object Model (DOM) updates only make the problem of craw...
متن کاملIndexing Rich Internet Applications Using Components-Based Crawling
Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs as states and JavaScript events execution as transitions. This approach fails when used with “real-life”, complex RIAs, because the size of the produced model is m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011